Skip to content

Add --inductor flag to example_ds3_pp with FORCE_BALANCED_ROUTING#361

Merged
xmfan merged 4 commits intomainfrom
xmfan/stack/31
Mar 17, 2026
Merged

Add --inductor flag to example_ds3_pp with FORCE_BALANCED_ROUTING#361
xmfan merged 4 commits intomainfrom
xmfan/stack/31

Conversation

@xmfan
Copy link
Member

@xmfan xmfan commented Mar 10, 2026

Stacked PRs:


Add --inductor flag to example_ds3_pp with FORCE_BALANCED_ROUTING

The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

xmfan added a commit that referenced this pull request Mar 10, 2026
The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

stack-info: PR: #361, branch: xmfan/stack/31
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 10, 2026
@xmfan xmfan changed the base branch from xmfan/stack/30 to main March 10, 2026 21:57
xmfan added a commit that referenced this pull request Mar 10, 2026
The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

stack-info: PR: #361, branch: xmfan/stack/31
@xmfan xmfan changed the base branch from main to xmfan/stack/30 March 10, 2026 21:57
@xmfan xmfan requested a review from sanketpurandare March 12, 2026 06:07
@xmfan xmfan marked this pull request as ready for review March 12, 2026 06:07
Copy link
Contributor

@sanketpurandare sanketpurandare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a CI test for this? We can use the same example PP test with --inductor True?

@xmfan xmfan changed the base branch from xmfan/stack/30 to main March 16, 2026 23:41
xmfan added a commit that referenced this pull request Mar 16, 2026
The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

stack-info: PR: #361, branch: xmfan/stack/31
@xmfan xmfan changed the base branch from main to xmfan/stack/30 March 16, 2026 23:41
@sanketpurandare sanketpurandare self-requested a review March 17, 2026 00:23
@xmfan xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 00:28
xmfan added a commit that referenced this pull request Mar 17, 2026
The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

stack-info: PR: #361, branch: xmfan/stack/31
@xmfan xmfan changed the base branch from main to xmfan/stack/30 March 17, 2026 00:28
@xmfan xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 00:57
xmfan added a commit that referenced this pull request Mar 17, 2026
The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

stack-info: PR: #361, branch: xmfan/stack/31
xmfan added a commit that referenced this pull request Mar 17, 2026
The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

stack-info: PR: #361, branch: xmfan/stack/31
@xmfan xmfan changed the base branch from main to xmfan/stack/30 March 17, 2026 00:57
xmfan added 3 commits March 16, 2026 20:46
The module-level `dispatcher.sharding_propagator = CustomShardingPropagator()`
was leaking into other test files (e.g. test_api.py) when run in the same
pytest process, causing `aten.copy_` failures because the custom propagator
doesn't have rules for ops that the default DTensor propagator handles.

test_dtensor.py's two test classes (ImplicitRegistrationTest, DimShardingTest)
inherit from DTensorTestBase which uses MultiProcessTestCase -- each test
spawns subprocesses that re-import the module. Those subprocesses don't run
pytest fixtures, so they need the custom propagator installed at module level.
We gate the module-level install on `multiprocessing.current_process().name`
to only run in spawned workers, and use a module-scoped autouse pytest fixture
to install/restore the propagator in the main process.

Authored with Claude.

stack-info: PR: #367, branch: xmfan/stack/32
Add _execute_graph() that lazily compiles graph modules with
compile_fx_inner on first invocation. Controlled by an inductor kwarg
threaded through all _run_* functions.

GraphPPRunner accepts inductor=True and propagates it to all
GraphPipelineStage instances, which the stage_* action functions
read when calling _run_*.

Authored with Claude.

stack-info: PR: #360, branch: xmfan/stack/30
The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

stack-info: PR: #361, branch: xmfan/stack/31
@xmfan xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 03:49
@xmfan xmfan changed the base branch from main to xmfan/stack/30 March 17, 2026 03:49
@xmfan xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 14:34
@xmfan xmfan changed the base branch from main to xmfan/stack/30 March 17, 2026 14:35
@xmfan xmfan changed the base branch from xmfan/stack/30 to main March 17, 2026 14:36
@xmfan xmfan merged commit 2d7ab90 into main Mar 17, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants